Relational Analysis for Clustering Consensus

نویسندگان

  • Mustapha Lebbah
  • Younès Bennani
  • Nistor Grozavu
  • Hamid Benhadda
چکیده

One of the most used techniques among many others in the data mining field is the clustering. The aim of this technique is to synthetize and summarize huge amounts of data by splitting it into small and homogenous clusters such that the data (observations) inside the same cluster are more similar to each other than to the observations inside the other clusters. This definition assumes that there exists a well defined clustering quality measure that quantifies how much homogeneous are the obtained clusters. The aim of this chapter is to expose an original approach to merge different partitions, related to the same data set, which are obtained either by applying different clustering techniques either by the same clustering technique with different parameters. Fusing partitions has been broadly studied and has been given several names, depending on different scientific fields, like machine learning or bioinformatics (Dudoit & Fridlyand, 2003; Kim & Lee, 2007; Monti et al., 2003). Among these names we can quote: consensus clustering, clustering aggregation, clustering combination, fusion of clustering, ...etc. Several studies (Frossyniotis et al., 2002; Minaei-Bidgoli et al., 2004; Strehl & Ghosh, 2002; Topchy et al., 2004; 2005) have pioneered clustering data sets as a new branch of the conventional clustering methodology. In (Topchy et al., 2004) the authors propose a probabilistic formalism of clustering concensus using a finite mixture of multinomial distributions in a space of clustering. The approach proposed in (Frossyniotis et al., 2002) is designed for combining runs of clustering algorithms with the same number of clusters. In (Strehl & Ghosh, 2002) the authors proposed combiners based on a hyper-graph model to solve the cluster fusion problem. The authors discuss two manners of consensus clustering: (1) Feature Distributed Clustering (FDC): a set of clustering are obtained from partial view of variables using all observations (2) Object-Distributed Clustering (ODC): with this technique the ensemble clustering has limited to subset of observation with access to all variables. The 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Multiple Medoids based Multi-view Relational Fuzzy Clustering with Minimax Optimization

Multi-view data becomes prevalent nowadays because more and more data can be collected from various sources. Each data set may be described by different set of features, hence forms a multi-view data set or multi-view data in short. To find the underlying pattern embedded in an unlabelled multiview data, many multi-view clustering approaches have been proposed. Fuzzy clustering in which a data ...

متن کامل

On aggregating binary relations using 0-1 integer linear programming

This paper is concerned with the general problem of aggregating many binary relations in order to find out a consensus. The theoretical background we rely on is the Relational Analysis (RA) approach. The latter method represents binary relations (BRs) as adjacency matrices, models relational properties as linear equations and finds a consensus by maximizing a majoritybased criterion using 0-1 i...

متن کامل

HINMF: A Matrix Factorization Method for Clustering in Heterogeneous Information Networks

Non-negative matrix factorization (NMF) has become quite popular recently on the relational data due to its several nice properties and connection to probabilistic latent semantic analysis (PLSA). However, few algorithms take this route for the heterogeneous networks. In this paper we propose a novel clustering method for heterogeneous information networks by searching for a factorization that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010